Jan 2011

نویسندگان

  • Eric Xing
  • John Lafferty
  • Tom Mitchell
  • Zoubin Ghahramani
  • Alexander J. Smola
چکیده

Online content have become an important medium to disseminate information and express opinions. With the proliferation of online document collections, users are faced with the problem of missing the big picture in a sea of irrelevant and/or diverse content. In this thesis, we addresses the problem of information organization of online document collections, and provide algorithms that create a structured representation of the otherwise unstructured content. We leverage the expressiveness of latent probabilistic models (e.g. topic models) and non-parametric Bayes techniques (e.g. Dirichlet processes), and give online and distributed inference algorithms that scale to terabyte datasets and adapt the inferred representation with the arrival of new documents. Throughout the thesis, we consider two different domains: research publications and social media (news articles and blog posts); and focus on modeling two facets of contnet: temporal dynamics and structural correspondence. To model the temporal dynamics of document collections, we introduce a nonparametric Bayes model that we call the recurrent Chinese restaurant process (RCRP). RCRP is a framework for modeling complex longitudinal data, in which the number of mixture components at each time point is unbounded. On top of this process, we develop a hierarchical extension and use it to build an infinite dynamic topic model that recovers the timeline of ideas in research publications. Despite the expressiveness of the aforementioned model, it fails to capture the essential element of dynamics in social media: stories. To remedy this, we developed a multi-resolution model that treats stories as a first-citizen object and combines long-term, high-level topics with short-lived, tightly-focused storylines. Inference in the new model is carried out via a sequential Monte Carlo algorithm that processes new documents on real time. We then consider the problem of structural correspondence in document collections both across modalities and communities. In research publications, this problem arises due to the multi-modalities of research papers and the pressing need for developing systems that can retrieve relevant documents based on any of these modalities (e.g. figures, text, named entities, to name a few). In social media this problem arises due to ideological bias of the document’s author that mixes facts with opinions. For both problems we develop a series of factored models. In research publications, the developed model represents ideas across modalities and as such can solve the aforementioned retrieval problem. In social media, the model contrasts the same idea across different ideologies, and as such can explain the bias of a given document on a topical-level and help the user staying informed by providing documents that express alternative views. Finally, we address the problem of inferring users’ intent when they interact with document collections, and how this intent changes over time. The induced user model can then be used in matching users with relevant content.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

2011; Analyzing Qualitative Content of VOA Farsi Website (Jan 1st 2011 to Jun 1st 2011)

This article is aimed at identification of elements in US media's diplomacy toward developments of Islamic awakening in Tunisia, Egypt, Yemen, Bahrain and Libya. To this end, VOA Farsi website has been studied and analyzed applying qualitative content analysis from Jan 1st 2011 to Jun 1st 2011 in the framework of theories of goalkeeping, agenda-setting and media representation theory. Studying ...

متن کامل

Cyber attacks: awareness

7. Geosynchronous Orbital Ion Cannon source code. Accessed Jan 2011. . 8. Moyer, E. ‘Report: FBI seizes server in probe of Wikileaks attacks’. CNET, 1 Jan 2011. Accessed Jan 2011. . 9. ‘Jester’s Court’. . 10.Leyden, J. ‘Anonymous hacktivists fire ion cannons at Zimbabwe’. The Regi...

متن کامل

Syria in Iranian Press

The main inquiry of this article is how differently the news of uprisings in Bahrain, Yemen and Syria has been covered in conservative and reformist press. In the section of theories, a review of agenda-setting and social constructionism has been offered and using theory of constructivism the role of values, identity and norms in social uprisings has been studied. In this article, ten newspaper...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011